Sampling properties of random graphs: the degree distribution 
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We discuss two sampling schemes for selecting random subnets from a network: Random sampling 
and connectivity dependent sampling, and investigate how the degree distribution of a node in the 
network is affected by the two types of sampling. Here we derive a necessary and sufficient condition 
that guarantees that the degree distribution of the subnet and the true network belong to the same 
family of probability distributions. For completely random sampling of nodes we find that this 
condition is fulfilled by classical random graphs; for the vast majority of networks this condition 
will, however, not be met. We furthermore discuss the case where the probability of sampling a node 
depends on the degree of a node and we find that even classical random graphs are no longer closed 
under this sampling regime. We conclude by relating the results to real E.coli protein interaction 
network data. 
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I. INTRODUCTION 



Most networks investigated today are parts of much 
larger networks. These subnets can come in two different 
forms: first, we can choose a region of a network and con- 
sider all nodes that are in this region and only the edges 
between these nodes (for example a connected component 
of the larger network would be one such subnet). Look- 
ing at networks defined by all servers in a country, or the 
interaction network of all proteins which are confined to 
the mitochondria would be real-world exa 

mplesdilll. 

Such networks may not be representative of the network 
as a whole but can give valuable insights into commu- 
nication or biological processes within a defined sphere. 
More complicated is a second type of subnet where each 
node of the global network is included in the subnet with 
a certain probability p and only the connections between 
pairs of nodes which are both included in the subnet are 
studied. This type of subnet is radically different from 
the regional-based subnets. It is, however, a frequent 
scenario in the analysis of technological and biological 
networks: most studies of molecular networks, such as 
protein-protein interaction^, gene-regulation and 
metabolic networks^, test for connections between a 
subset of the known molecular entities (proteins, genes 
and enzymes/metabolites, respectively). The process by 
which these entities (or corresponding probes) are cho- 
sen may reflect the bias of the experimenter or merely 
chance, and this will in turn influence the extent to which 
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FIG. 1: Sampling nodes from the network (top) will give rise 
to subnets (bottom). If edges are only observed if both nodes 
incident on an edge are included in the subnet (indicated in 
dark blue), then the degree distributions (as well as other 
characteristics) of the subnet and global network will be dif- 
ferent. In the text we show that sometimes, however, degree 
distributions in both networks can be related under random 
sampling of nodes. 



the subnet reflects properties of the global network in a 
meaningful way. In light of the relative straightforward- 
ness of studying the sampling properties of networks, and 
their obvious importance for the analysis of current net- 
work data sets it is surprising that this problem has not 
been addressed previously. 

Here we will focus on the simplest, and perhaps most 
parsimonious, process of sampling nodes: the case where 
each node in the network is included with probability 
< p < 1. In the present analysis we will concentrate 
on the sampling properties of the degree distribution of a 
network. The degree distribution, henceforth denoted by 
Pr(fc), specifies the probability for a node to have k con- 
nections, k = 0, 1, . . ., and is probably the most common 
summary statistic used in the analysis of networks @ . In 
particular the potential scale-free nature of real networks 
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is often identified from the empirical degree distribution, 
which for scale-free networks takes on a power-law form, 
Pr(fc) oc fc-T HUGH EH. Frequently a model is con- 
sidered scale- free if the tail (i.e. for k sufficiently large) 
of the degree distribution takes such an asymptotic pow- 
erlaw form 0, Here we will consider this case as 

well as network ensembles with an exact power-law de- 
gree distribution. The central question addressed here 
is whether the degree distribution of randomly sampled 
subnets has the same properties as the degree distribu- 
tion of the overall network. Thus far this question has 
been ignored in the literature, but as we will show, is of 
great importance in the analysis of real networks, which 
in their vast majority, are only subnets of larger networks. 
Unless explicitly stated otherwise we shall consider the 
thermodynamic limit, N — > oc. 



networks where experimenters choose a set of proteins in 
a more or less haphazard fashion. 
From Eqn. we can show that 



E[E[Z|fc]] =pE[k] =pr, 



(3) 



where r := E[fc] is the average degree in the network. 
Similarly we can show that the m-th moment of the 
descending factori al ( defined by i[ m ] = x(x — l)(x — 
2)...(x-m+l) Um) for the de gree distribution of a 
network obeys 



Es[i[« 



(4) 



Eqns. © and QJ are fulfilled for all networks, as long as 
the moments exist; for scale- free networks with exponent 
7, for example, moments of order greater or equal than 
L7J do not exist. 



II. THE DEGREE DISTRIBUTION OF A 
RANDOM SUBNET 



2. Random sampling dependent on degree 



A. Sampling from networks 

We use Af to denote a network with N nodes (we allow 
N — > 00) drawn from a statistical ensemble of random 
networks [T^ . IT3 | defined by some (potentially vector val- 
ued) parameter Q and let Pr(fc) be its degree distribution; 
the total number of edges is given by M. Here we will 
be especially concerned with the case of a subnet S gen- 
erated from the global network Af by randomly sampling 
each node i € Af with a certain probability < pi < 1. 
Thus if a node of degree k gets picked for inclusion in 
the subnet, its degree in the subnet will depend on the 
number of its neighbours which are also included in S. 



1. Random sampling 

We start by considering the case where the probability 
of picking a node is identical for all nodes, Pi = p for all i. 
Here p = and p = 1 are the trivial cases for which <S> = 
and S — Af, respectively. Formally, the probability that 
a node has connectivity I in S given it has connectivity 
k in Af is 



Pr(Z|fc) 



p l (l-pT 



k-l 



(1) 



where Pr(x\y) denotes the conditional probability of x 
given y. The degree distribution in the subnet is thus 
given by 

00 00 / 1 \ 

Pr 5 (Z) = Pr(Z|fc)Pr(fc) - £ ( / V ^ ~ P)*~ Ipr (*)- 



k>i 



k>l 



This is probably the simplest and most parsimonious 
sampling scheme and may also be a reasonably realistic 
approximation, e.g. in the study of protein interaction 



A further sampling scheme will be considered here 
where the number of connections directly influences the 
probability, n(k), of sampling a node of degree k] In the 
previous sampling scheme all nodes had the same chance 
of being sampled, ir(k) = p. We will focus on the partic- 
ular case of an uncorrelated network. 

The connectivity of a node in the subnet thus depends 
on the degrees of its neighbours. The probability that a 
node connected to a randomly chosen edge has degree k 
is given by 



Pr*(fc) 



fcPr(fc) 



(5) 



where r is the average degree in the network; the aver- 
age degree of the neighbours of a randomly chosen node 
is thus E[/c 2 ]/E[fc], if the two first moments of the de- 
gree distribution exist; below we will limit ourselves to 
such situations (for finite networks the moments will, of 
course, exist). Assuming a node is retained in the sub- 
net then the probability of sampling a neighbouring node 
depends also on its connectivity and, in a mean-field ap- 
proximation, the probability of retaining an edge origi- 
nating from a node, p is thus given by 



P : 



- fcPr(fcMfc) 



(6) 



The degree distribution of the subnet S is again given by 
binomial sampling: 



Prs(0 = 
Defining 



k>l 



E , )p l (l-p) k - l <k)Pr(k) 



/5>(fc)Pr(fc). 

(7) 



fc=0 



Pr (fc) = 7r(fc)Pr(fc)/E 7r ( fc ) Pr ( fc ) 



(8) 



fc=0 
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we can rewrite Eqn. Q in the same form as Eqn. J5|. 
With these probabilities the degree distribution in the 
subnet is given analogously to Eqn. (0) as 



k>l 



(9) 



Obviously, when setting ir(k) — p Eqn. JSJ simplifies to 
Eqn. @. 

We still have to specify the functional form of 7r(fc); a 
priori the only constraint is that ir(k) has to be a prob- 
ability for all k, i.e. < n(k) < 1, Vfc = 0, 1, 2 . . .. One 
possible and obvious choice is to let ir(k) oc k; in order 
to ensure that 7r(fc) < 1 for large k we set 



n(k) = Ck 



(10) 



with C sufficiently small such that ir < 1 for large k 
(we can always trivially set C = 2E[M]) with E[M] the 
expected number of edges in the network). In this case 



P 



= ^Yk 2 Pr(k) = CE\k% 

T t— 1 



(11) 



i.e. p depends on the degree distribution solely via the 
first and second moments of Pr(fc). We will refer to this 
sampling scheme as preferential sampling of nodes. 



B. Probability generating functions of random 
subnets 



We represent the degree distribution of a network M 
through its probability generating function (PGF) 0, 



m 



G(s) = ]rPr(fc)s fc . 



(12) 



k=0 



The probability Pr(fc) follows from the PGF via the re- 
lationship 



Pr(fc) 



1 d k G{s) 



k\ ds k 



s=0 



With Eqns. (|12|) and we can straightforwardly de- 
rive the PGF for the subnet 



G s (s) =J2 P < 1 > 1 

oc oc 

= ^V ^Pr(J|fc)Pr(fe) 



1=0 k=l 
oo k / ; \ 

k=0 1=0 ^ ' 



^Pr(fc)(l-p + p S ) A 



fe=0 



G(l -p + ps). 



(14) 



If nodes with degree / = are ignored (as is frequently the 
case in high throughput protein interaction data) then 
after deleting all nodes with I = the PGF in the subnet 
becomes 

_ G {l-p + ps)-G{l-p) 
Gs{3) - l-G(l-p) • (15) 

Eqns. I)14|l and H15(l . respectively, hold generally for the 
degree distributions of subnets randomly sampled from 
networks, depending on whether orphaned nodes (i.e. 
those with connectivity I = 0) are allowed or not [l7|. 

Interestingly, if Eqn. I|14|) holds then also Eqn. i|15|) 
holds with G(s) replaced by G* (s) = (G(s) - Pr(0))/(1 - 
Pr(0)); i.e. networks with orphaned nodes removed are 
closed under random sampling if the networks with the 
orphaned nodes retained are. 



III. CLOSURE UNDER RANDOM SAMPLING 
FROM NETWORKS 

A. Conditions for closure: generating function 

From Eqns. (|14fl and (|15fl it is apparent that degree 
distributions of a subnet S cannot generally be expected 
to be of the same type (e.g. a Possion distribution) as the 
degree distribution of the global network N '. For some 
important types of networks, however, it can be shown 
that random sampling of nodes gives rise to networks 
with degree distributions of the same type as the global 
network, but with a different parameter depending on p, 
i.e. VL' = fn(Q,p). In this case we say that a network (or 
its degree distribution) is closed under random sampling 
of nodes. For a network ensemble to be closed under 
random sampling the following condition is necessary and 
sufficient |17j. 



G s (s; SI) = G(s; Si') = G(l - p + ps; SI), (16) 



and 



(13) G* s (s;Sl) = G(s;Sl') = 



G(l-p + ps;Sl) -G(l-p;Sl) 
l-G(l~p;Sl) 



when the subnet is not allowed to have orphaned nodes. 
Necessity and sufficiency follow from Eqns. I|14|) and i|15|) 
and the definition of the properties of a closed subnet. 



B. Conditions for closure: moments 

Equations (fTBT) and l(T7|l can be applied to all types 
of degree distributions. Inspired by Eqns. J2J and (@J 
we here derive a general condition in terms of moments 
for a subnet to be of the same type as the global net- 
work. We assume the moments determine the degree 
distribution uniquely (in particular, this implies that all 
moments exist), which is true under mild regularity con- 
ditions. Let an ensemble of random networks be given 
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which is parameterized by O. For example, the ensemble 
of classical or Erdos-Renyi random graphs 0, has 

Pr(fc) = exp(— A)^y and fl — A is the average connectiv- 
ity. We seek a condition that, provided nodes are sampled 
with probability p, ensures that the degree distribution 
of the subnet remains in the same ensemble of random 
networks. Without loss of generality we can assume that 
n has the form f2 = (r, ?/>), where r is the average degree 
in the network and ip is an additional (potentially vector 
valued) parameter. 

From Eqn. we know that the average connectivity 
in the sampled subnet, r p , is given by t p — pr. We can 
use Eqn. Q to show that a family of degree distributions 
is closed under random sampling of nodes if and only if 
the descending factorial moments obey the relationship 



E[k [r , 



(18) 



where a m (ip) is a constant that depends only on m and if) 
but not on r and the sampling probability p, and where 
ai(ip) = l. 

To prove that Eqn. IjlSjl is necessary we assume that 
the network is closed under random sampling of nodes 
and write r = E(fc) and ,g m (r, tp) — E(fc[ m j). Because of 
Eqns. J3J and (@J we can immediately write 



9m(pr,il>) =p m g m {T,-)p) 



and 



(pT) m 



Thus g m (T, ip)/T m = const, (for all r) or 
g m (r,ip) = a m (ip)T m , 



(19) 



(20) 



(21) 



with ai(ip) = 1 as required. 

To prove sufficiency assume that the descending mo- 
ments of fc[ m ] fulfil Eqn. (|18fl : using Eqn. Q the de- 
scending factorial moments of the nodal degrees in the 
subnet follow the relationship 



(22) 



Since the descending moments determine the moments, 
E(fc m ) of a degree distribution, which in turn determine 
the distribution uniquely (by assumption), then the de- 
gree distribution of the subnet is given by a distribution 
that is of the same type as the degree distribution but 
with a rescaled parameter. Thus Eqn. p8|) is a neces- 
sary and sufficient condition for a network ensemble to 
be closed under random sampling of nodes. □ 



C. Analytical Examples 

We can use relationships (|16fl and (|18fl to determine 
whether a degree distribution is closed under random 



sampling. We will discuss this for three commonly ob- 
served degree distributions. Note that we only consider a 
degree distribution to be closed under (random) sampling 
if the degree distributions of the network and the subnet 
belong to the same family of probability distributions. 

Classical random graphs have a Poisson degree distri- 
bution, Po(A). It is straightforward to show that the 
descending moments of the Poisson distributed random 
variables are given by 



E[fc w ] = r m = A* 



(23) 



Thus a m — 1 for all m > 1 and the degree distribution 
of classical random graphs is closed under random sam- 
pling of nodes. If we therefore have a subnet S of size 
M drawn from a larger network J\f of known size N we 
can determine A from \$ as A = A^-p-. The subnet is 
therefore informative about the global network. 

Networks which grow by random attachment of new 
nodes give rise to exponential degree distributions such 
that asymptotically (large N) Pr(fc) = (1 - e~ a )e~ ka . 
For such a distribution it is easily shown that 



Effcr, 



(1 



(24) 



since E[fc] = e~ a /(l — e~"). This means that E[fc[ m j] 
can be written in the form specified by Eqn. (|18fl and 
therefore exponential degree distributions are closed un- 
der random sampling. Binomial (as for classical finite- 
sized random graphs) and negative binomial distributions 
are also closed under random sampling as is easily veri- 
fied. An explicit construction of probability distributions 
which are closed is discussed in appendix A. 

If the probability of attaching to a node is proportional 
to its degree the resulting network will asymptotically 
have a power-law degree distribution with exponent 3 
[l2]] . For models where an existing node is duplicated and 
each of its connections is kept with certain probability 
degree distributions will also be power-law like but with 
exponents 2 < 7 < 3 [l9| . 

We first consider the sampling properties of network 
ensembles with degree distribution given by an exact 
powerlaw, Pr(k) = k^ 1 /C{l)- I n the asymptotic limit, 
N — * 00, all moments greater than [7] diverge and we 
therefore have to use the PGF formalism. The PGF for 
the global network is given by 



(25) 



and since k = is explicitely forbidden in a scale-free 
network, we use Eqn. i(T7|) to construct the PGF in the 
subnet, whence 



Egg [{i-P + ps) k -{i~ P ) k } k- 

c(7)-£r=i(i-p) fe fc- 7 



(26) 



Clearly for p — > 1 we obtain the original PGF, G(s;7). 
For < p < 1, however, it is impossible to determine 
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an exponent 7' such that Gs could be written in terms 
of the PGF of a power law. Therefore random subnets 
drawn from exact scale-free networks are not themselves 
scale-free. This can also be shown explicitely using a 
series expansion ^7j- We note, however that the tail 
of the degree distribution of the subnet still takes on a 
powerlaw form for k sufficiently large. The same analysis 
applied to other fat-tailed probability distributions also 
shows that other fat-tailed degree distributions such as 
the log-normal and the stretched exponential families [2(j 
are not closed under random sampling. 



D. Numerical Examples 

The effect of random sampling on the degree distribu- 
tion is most straightforwardly illustrated using numeri- 
cal solutions of Eqns. J2J) and H6I9I ) Here we do this for 
networks of infinite size and for simplicity focus on the 
canonical models of the classical random graph and the 
exact scale-free network, respectively. 

In part (a) of figurc[2]we show the Poisson distribution 
with A = 5 and the distributions of random subnet with 
p = 0.8 and p — 0.2, respectively. The subnet distri- 
butions are identical with the Poisson distributions with 
parameters A — 4 and A = 2. This also means that as pX 
becomes smaller than one the subnet will move through 
the phase-transition where the giant connected compo- 
nent dissolves and the size distribution of connected parts 
of the subnet becomes exponential. 

In part (b) of the same figure we show the power-law 
distribution with 7 = 3 and again the respective subnet 
degree distributions (renormalized such that Prg(0) = 
in the subnet). We find that the subnet degree distribu- 
tions are no longer straight lines but that as k becomes 
large they run parallel to the original distributions. That 
is, as already described above, the tails of degree dis- 
tributions of subnets sampled randomly from scale-free 
networks also fall off in the same power-law fashion as 
the original network. But at low connectivities the de- 
parture from the scale-free network is quite pronounced: 
probability mass moves from the tail towards the lowly 
connected nodes with k = 1, which become more abun- 
dant than would be expected for a true scale-free net- 
work. This will have quite considerable effects for finite 
size networks. The deviation of the subnet degree dis- 
tribution from a pure power-law at small to intermedi- 
ate connectivities increases with 7 (as well as, naturally, 
with decreasing sampling probability p). We note how- 
ever, that the tail of the degree distribution will retain a 
powerlaw form; thus for an alternative definition of scale- 
free behaviour which only requires Prjv(fc) cx fc~ 7 for 
k — > 00 random subnets will retain scale-free behaviour 
in the sense that the tail is still described by a powerlaw 
Pr^fc) oc fc -7 for k — > 00. In general, however, when 
the whole degree distribution is considered scale- free net- 
works are not closed under random sampling. 
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FIG. 2: Degree distributions of full network and subnets ob- 
tained by sampling each node with probability p — 0.8 and 
p = 0.2, respectively, for classical random graphs (a) and 
scale-free networks (b). 



IV. CONNECTIVITY DEPENDENT SAMPLING 



There is no unique and obvious way in which the prob- 
ability of sampling a node may depend on the connectiv- 
ity. Here we briefly outline the behaviour of the degree 
distribution under the simple schemes outlined above 
where the probability of sampling a node is no longer 
uniform but linearly proportional to its connectivity, i.e. 
if 7r(fc) oc k; we assume that p(k) is given by Eqn. Qllfl. 

For a Poisson degree distribution with parameter A 



we have E[fc 



+ t = A + A and E[M] = NX/2, 



(assuming the network is large and finite) whence p 
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(A + 1)/(JVA) and 



Pv (k) 



e A 



A \ fc-l 



(fc-1)!' 



(27) 



if we set C = 2E[M] in Eqn. (fTOf) . In this case Eqn. © 
becomes 

Pr 5(0=E (jp'(l-p) fc -'Pro(fc) 



(28) 



for Z = 0, 1, . . .. The distribution in the subnet is thus 
not a pure Poisson distribution but one multiplied by 
a factor 1 — p + l/X. Under this connectivity dependent 
sampling classical random graphs are therefore not closed 
and subnets S are qualitatively (if perhaps only rather 
slightly) different from the overall network JV. 

For scale-free networks with 7 < 3 the second mo- 
ment diverges, E[/c 2 ] — > 00, and we therefore focus on fi- 
nite (though potentially very large networks). Networks 
with a power law degree distribution can, for example, 
be constructed using standard methods |2lt l22l |23 | . For 
such a scale-free graph with N nodes we have to numer- 
ically evaluate the expected number of edges E[M] = 

|EL^ 7+1 /C(7) and P, given by Eqn. (JTTJ. For 
Pro (A;) we obtain for scale-free networks 



Pro(fc) = 



CvT^T)' 



(29) 



Proportional sampling from a scale-free network defined 
by a powerlaw exponent 7 is thus identical to sampling 
from a network with powerlaw exponent 7 — 1 and sam- 
pling probability p. Therefore we can use the results ob- 
tained above and conclude that the scale-free network (in 
the strict sense outlined above) is not closed under pro- 
portional sapling of nodes; for sufficiently large degrees, 
however, the tail of the degree distribution will still have 
a powerlaw form. 



V. PROTEIN INTERACTION NETWORK DATA 

In figure |3| we show three degree distributions cor- 
responding to the protein interaction network (PIN) 
data from E.coli which was available in April 2003, 
2004 and 2005 in the database of interaction proteins 
(DIP; dip.doe-mbi.ucla.edu); the resulting networks 
are made up of the interactions among 228, 373 and 480 
proteins and have 293, 515 and 760 interactions, respec- 
tively. Figure [3] confirms the results of the theoretical 
analysis presented above: as the fraction of sampled net- 
work nodes decreases statistical weight shifts from the 
tail towards lower degrees; the degree of the single highly 
connected node, k — 54, was already known in the 2003 
dataset (no further interactions have been added to this 
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2003, N=280 
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1 2 5 1 20 50 

Degree k 

FIG. 3: Degree distributions of protein interaction network 
data available for E.coli in April of the years 2003, 2004 and 
2005, respectively. As the fraction of sampled nodes/proteins 
decreases statistical weight is shifted from the tail towards 
lower degrees. 



node since) . The statistical weight of sparsely connected 
nodes, k = 1, increases as the fraction of sampled nodes 
decreases. We note that the present data samples only a 
small subnet of the E. coli PIN which consists of interac- 
tions among approximately 4000 proteins. Moreover (i) 
it is well established that PIN data is highly unreliable 
and very noisy, and (ii) the true sampling scheme under- 
lying the sampling scheme will generally be more com- 
plicated than the first order model employed here. The 
behaviour appears, however, to be qualitatively similar 
to our theoretical analysis. 



VI. CONCLUSION 

Both sampling schemes discussed here are necessarily 
simpler than is the case in many real situations, such 
as the analysis of protein interaction networks (see e.g. 

We believe, however, that between them they 
retain some vestiges of reality. Crucially, however, we 
wish to stress the incomplete nature of many network 
data sets. For many of these data sets in fact, including 
protein interaction network data, it appears that some 
form of random sampling is more realistic than a process 
in which the neighbourhood of a node is explored and 
neighbouring sites are recruited iteratively into the ex- 
perimental setup. No matter what the sampling process 
is, it has to be included into the analysis from the outset: 
making inferences from incomplete (in the sense that not 
all nodes have been sampled) network data may give mis- 
leading results. If a network is closed under random (or 
connectivity dependent) sampling then it is straightfor- 
ward to infer properties of the overall network from the 
subnet. For some, notably Erdos-Renyi random graphs, 
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this is indeed the case. In general, however, the degree 
distributions of the network and sampled subnets will be 
qualitatively different. For example, while powerlaw tails 
will also give rise to powerlaw tails in the subnet a net- 
work which has an exact powerlaw degree distribution is 
not closed under random sampling. The same is true for 
other broad-tailed degree distributions such as lognormal 
or stretched exponential distributions. 

Sampling properties will also affect other network 
statistics, including network diameter and average path 
length, clustering coefficient and network motifs. These 
will be studied in a companion paper. We believe that 
sampling properties ought to be included explicitly and 
from the outset into any network analysis, unless there 
is good evidence that the whole (or the majority) of the 
network's nodes have been included in the data. Quite 
apart from the relevance of this work in the analysis of 
real data we believe that a detailed analysis of sampling 
properties of graphs is a rich field which, surprisingly, 
appears to have been neglected thus far. 
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closed under binomial random sampling. We can also 
use Eqn. (|I8|I to construct closed distributions de novo 
as any series of positive numbers afe, k — 1,2,... with 
a\ = 1 defines a family of random variables closed under 
binomial sampling via the condition 

E[k [m] ] = a m r m (Al) 

for some r € T = [0, t] and t > 0. 

First, the degenerate distribution Pr(fc = 0) = 1 is 
defined by E[fcr TO i] = for all m > 0. Therefore must 
be in the interval T and T is non-empty as the degenerate 
distribution is trivially closed under binomial sampling. 
Now assume that r > defines the distribution of k 
through Eqn. i|Al|l. Any t* with < t* < r defines 
the degree distribution after binomial sampling of nodes 
from k with probability p = t* /t which, by construction, 
has degree distribution given by E[Z[ m j] = a m {r*) m . The 
distributions defined by Eqn. IjAlfl are therefore closed 
under random sampling of nodes. 

Eqn. I|A1|I can be used to to construct arbitrary degree 
distributions which are closed under binomial sampling. 
Nontrivial examples are possible; for example 

a k = (k + l)2- k for k = 1,2,... (A2) 

defines a distribution closed under random sampling, 



APPENDIX A: CONSTRUCTION OF CLOSED 
DEGREE DISTRIBUTIONS 

We have shown that Eqn. (|18|l is both a necessary 
and sufficient condition for a degree distribution to be 



Pr(fc) = ^-^(fc + l-2r)e- 2r (A3) 
k\ 

where r = E[k] £ [0, 0.5] (note that for t = 0.5, Pr(fc - 1) 
defined by Eqn. i|A3|) is Poisson distributed). 
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